AITopics | reward propagation

Collaborating Authors

reward propagation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reward Propagation Using Graph Convolutional Networks

Neural Information Processing SystemsDec-24-2025, 08:04:55 GMT

Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning. However, automatically finding potential functions for complex environments is a difficult problem (in fact, of the same difficulty as learning a value function from scratch). We propose a new framework for learning potential functions by leveraging ideas from graph representation learning. Our approach relies on Graph Convolutional Networks which we use as a key ingredient in combination with the probabilistic inference view of reinforcement learning. More precisely, we leverage Graph Convolutional Networks to perform message passing from rewarding states. The propagated messages can then be used as potential functions for reward shaping to accelerate learning. We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems.

graph convolutional network, name change, reward propagation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Review for NeurIPS paper: Reward Propagation Using Graph Convolutional Networks

Neural Information Processing SystemsJan-26-2025, 18:44:17 GMT

The method adapts a sampled trajectory-based approximation of the transition graphs. But given the trajectory samples, sequential models (RNN etc.) are sufficient to estimate the potential functions. It would be good if the authors can clarify the advantage and the necessity of GCN on the sampled trajectory inputs, compared to sequential models. The baselines, ICM and RND, are motivated to address hard exploration RL tasks, while the potential based reward shaping is motivated for faster convergence. They are related but address different issues. A more informative empirical comparison would be against the LIRPG (Learning Intrinsic Rewards for Policy Gradient from [a]), because both this paper and LIRPG aim at learning reward shaping for speed up policy learning.

graph convolutional network, neurips paper, reward propagation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Reward Propagation Using Graph Convolutional Networks

Neural Information Processing SystemsOct-10-2024, 20:22:52 GMT

graph convolutional network, potential function, reward propagation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Generalizing LTL Instructions via Future Dependent Options

Xu, Duo, Fekri, Faramarz

arXiv.org Artificial IntelligenceDec-15-2022

In many real-world applications of control system and robotics, linear temporal logic (LTL) is a widely-used task specification language which has a compositional grammar that naturally induces temporally extended behaviours across tasks, including conditionals and alternative realizations. An important problem in RL with LTL tasks is to learn task-conditioned policies which can zero-shot generalize to new LTL instructions not observed in the training. However, because symbolic observation is often lossy and LTL tasks can have long time horizon, previous works can suffer from issues such as training sampling inefficiency and infeasibility or sub-optimality of the found solutions. In order to tackle these issues, this paper proposes a novel multi-task RL algorithm with improved learning efficiency and optimality. To achieve the global optimality of task completion, we propose to learn options dependent on the future subgoals via a novel off-policy approach. In order to propagate the rewards of satisfying future subgoals back more efficiently, we propose to train a multi-step value function conditioned on the subgoal sequence which is updated with Monte Carlo estimates of multi-step discounted returns. In experiments on three different domains, we evaluate the LTL generalization capability of the agent trained by the proposed method, showing its advantage over previous representative methods.

logic & formal reasoning, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2212.04576

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

He, Frank S., Liu, Yang, Schwing, Alexander G., Peng, Jian

arXiv.org Machine LearningNov-5-2016

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1611.01606

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.85)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback